This tutorial will teach you how to configure a Multi-Node cluster with Cassandra on a VPS. Cassandra, a highly scalable open source database system that achieves great performance when setup with multiple-nodes – even on different data centers.
Before we begin configuring each node, you need to have Cassandra installed in every one of them. We have an easy tutorial on how to do that with VPS. After you've installed Cassandra on every node, you need to make sure it isn't running. To close Cassandra, type in:
sudo ps auwx | grep cassandra
If a process different from the "grep" one appears, copy the proccess ID and kill it:
sudo kill -9 PID
You'll also need to clear data. Do so by running:
sudo rm -rf /var/lib/cassandra/*
To configure Cassandra for multiple nodes, you'll need to know beforehand how many nodes you're going to use, and calculate token numbers for each. We've developed a tool to do this, and you can get it here. Simply write the number of nodes you're dealing with and you'll have tokens for each node. For example, if you have three nodes, you'd have these numbers:
Node 0: 0 Node 1: 3074457345618258602 Node 2: 6148914691236517205
Now you'll need to edit your configuration file for each node. To do so, open the nano text editor by running:
nano ~/cassandra/conf/cassandra.yaml
The information you'll need to edit can be the same for all nodes (cluster_name, seed_provider, rpc_address and endpoint_snitch) or different for each one (initial_token and listen_address). Choose a node to be your seed one, and look in the configuration file for the lines that refer to each of these attributes, and modify them to your needs:
cluster_name: 'Name' initial_token: Token seed_provider: - seeds: "Seed IP" listen_address: Droplet's IP rpc_address: 0.0.0.0 endpoint_snitch: RackInferringSnitch
Substitute “Name” by your cluster name, “Token” by the number you generated earlier (depending on the node), “Seed IP” by your seed node’s IP, and “Droplet’s IP” by your droplet’s IP address. Do this for each node. Example of this filled on a 3-node setup:
Node 0 cluster_name: 'MyDigitalOceanCluster' initial_token: 0 seed_provider: - seeds: "198.211.xxx.0" listen_address: 198.211.xxx.0 rpc_address: 0.0.0.0 endpoint_snitch: RackInferringSnitch Node 1 cluster_name: 'MyDigitalOceanCluster' initial_token: 3074457345618258602 seed_provider: - seeds: "198.211.xxx.0" listen_address: 192.241.xxx.0 rpc_address: 0.0.0.0 endpoint_snitch: RackInferringSnitch Node 2 cluster_name: 'MyDigitalOceanCluster' initial_token: 6148914691236517205 seed_provider: - seeds: "198.211.xxx.0" listen_address: 37.139.xxx.0 rpc_address: 0.0.0.0 endpoint_snitch: RackInferringSnitch
To run, simply type in:
sudo sh ~/cassandra/bin/cassandra
on the seed node and when it's finished, replicate this process on the other nodes. If you don't see any errors, your multi-node Cassandra setup should be successfully deployed.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
http://db.tt/S5wHPN4f is a broken link… can you update it?
Hi,
I have one question when setup multi cluster node, we have only cluster name unique for all node but we have not configured ip of all node in any cassandra.yaml file. In this case how it decides to which node it has to connect?
Was very useful! Thank you